rank | frequency | n-gram |
---|---|---|
1 | 26113 | -a |
2 | 24784 | -u |
3 | 21902 | -m |
4 | 21815 | -í |
5 | 21172 | -i |
rank | frequency | n-gram |
---|---|---|
1 | 11248 | -ch |
2 | 9786 | -ou |
3 | 8642 | -ní |
4 | 7694 | -ho |
5 | 7110 | -la |
rank | frequency | n-gram |
---|---|---|
1 | 5287 | -ová |
2 | 5000 | -ého |
3 | 4950 | -ých |
4 | 3496 | -ích |
5 | 3339 | -ové |
rank | frequency | n-gram |
---|---|---|
1 | 2001 | -ných |
2 | 1848 | -ovou |
3 | 1786 | -kého |
4 | 1696 | -ovat |
5 | 1688 | -ného |
rank | frequency | n-gram |
---|---|---|
1 | 1165 | -ského |
2 | 1156 | -ování |
3 | 1066 | -ovala |
4 | 1063 | -ových |
5 | 1048 | -ovali |
The tables show the most frequent letter-N-grams at the ending of words for N=1…5. Everything runs in parallel to 2.2.5 Most frequent word beginnings. The aim is suffix detection instead of affix detection.
For N=3:
SELECT @pos:=(@pos+1), xx.* from (SELECT @pos:=0) r, (select count(*) as cnt ,concat("-", right(word,3)) FROM words WHERE w_id>100 group by right(word,3) order by cnt desc) xx limit 5;
2.2.5 Most frequent word beginnings